Efficient Splice Site Prediction with Context-Sensitive Distance Kernels

نویسندگان

  • Bernard Manderick
  • Feng Liu
  • Bram Vanschoenwinkel
چکیده

This paper presents a comparison between different context-sensitive kernel functions for doing splice site prediction with a support vector machine. Four types of kernel functions will be used: linear-, polynomial-, radial basis functionand negative distance-based kernels. Domain-knowledge can be incorporated into the kernels by incorporating statistical measures or by directly plugging in distance functions defined on the splice site instances. From the experimental results it becomes clear that the radial basis function-based kernels get the best accuracies. However, because classification speed is of crucial importance to the splice site prediction system, this kernel is computationally too expensive. Nevertheless, in general incorporating domain knowledge does not only improve classification accuracy, but also reduces model complexity which in its turn again increases classification speed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Splice Site Prediction using Support Vector Machines with Context-Sensitive Kernel Functions

This paper focuses on the use of support vector machines on a typical context-dependent classification task, splice site prediction. For this type of problems, it has been shown that a context-based approach should be preferred over a transformation approach because the former approach can easily incorporate statistical measures or directly plug sensitivity information into distance functions. ...

متن کامل

Accurate Splice Site Detection for Caenorhabditis elegans

We propose a new system for predicting the splice form of Caenorhabditis elegans genes. As a first step we generate a clean set of genes from available exressed sequence tags (EST) and complete complementary (cDNA) sequences. From all such genes we then generate potential acceptor and donor sites as they would be required by any gene finder. This leads to a clean set of true and decoy splice si...

متن کامل

Modelling splice sites with locality-sensitive sequence features

The splice sites are essential for pre-mRNA maturation and crucial for Splice Site Modelling (SSM); however, there are gaps between the splicing signals and the computationally identified sequence features. In this paper, the Locality Sensitive Features (LSFs) are proposed to reduce the gaps by homogenising their contexts. Under the skewness-kurtosis based statistics and data analysis, SSM attr...

متن کامل

Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information.

Artificial neural networks have been combined with a rule based system to predict intron splice sites in the dicot plant Arabidopsis thaliana. A two step prediction scheme, where a global prediction of the coding potential regulates a cutoff level for a local prediction of splice sites, is refined by rules based on splice site confidence values, prediction scores, coding context and distances b...

متن کامل

Identification of alternative 50/30 splice sites based on the mechanism of splice site competition

Alternative splicing plays an important role in regulating gene expression. Currently, most efficient methods use expressed sequence tags or microarray analysis for large-scale detection of alternative splicing. However, it is difficult to detect all alternative splice events with them because of their inherent limitations. Previous computational methods for alternative splicing prediction coul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007